146 research outputs found

    Nonparametric Bayesian attentive video analysis

    Get PDF
    We address the problem of object-based visual attention from a Bayesian standpoint. We contend with the issue of joint segmentation and saliency computation suitable to provide a sound basis for dealing with higher level information related to objects present in dynamic scene. To this end we propose a framework relying on nonparametric Bayesian techniques, namely variational inference on a mixture of Dirichlet processes

    How to look next? A data-driven approach for scanpath prediction

    Get PDF
    By and large, current visual attention models mostly rely, when considering static stimuli, on the following procedure. Given an image, a saliency map is computed, which, in turn, might serve the purpose of predicting a sequence of gaze shifts, namely a scanpath instantiating the dynamics of visual attention deployment. The temporal pattern of attention unfolding is thus confined to the scanpath generation stage, whilst salience is conceived as a static map, at best conflating a number of factors (bottom-up information, top-down, spatial biases, etc.). In this note we propose a novel sequential scheme that consists of a three-stage processing relying on a center-bias model, a context/layout model, and an object-based model, respectively. Each stage contributes, at different times, to the sequential sampling of the final scanpath. We compare the method against classic scanpath generation that exploits state-of-the-art static saliency model. Results show that accounting for the structure of the temporal unfolding leads to gaze dynamics close to human gaze behaviour

    Improving the accuracy of automatic facial expression recognition in speaking subjects with deep learning

    Get PDF
    When automatic facial expression recognition is applied to video sequences of speaking subjects, the recognition accuracy has been noted to be lower than with video sequences of still subjects. This effect known as the speaking effect arises during spontaneous conversations, and along with the affective expressions the speech articulation process influences facial configurations. In this work we question whether, aside from facial features, other cues relating to the articulation process would increase emotion recognition accuracy when added in input to a deep neural network model. We develop two neural networks that classify facial expressions in speaking subjects from the RAVDESS dataset, a spatio-temporal CNN and a GRU cell RNN. They are first trained on facial features only, and afterwards both on facial features and articulation related cues extracted from a model trained for lip reading, while varying the number of consecutive frames provided in input as well. We show that using DNNs the addition of features related to articulation increases classification accuracy up to 12%, the increase being greater with more consecutive frames provided in input to the model

    Generalized spatio-chromatic diffusion

    Full text link

    Problems with Saliency Maps

    Get PDF
    Despite the popularity that saliency models have gained in the computer vision community, they are most often conceived, exploited and benchmarked without taking heed of a number of problems and subtle issues they bring about. When saliency maps are used as proxies for the likelihood of fixating a location in a viewed scene, one such issue is the temporal dimension of visual attention deployment. Through a simple simulation it is shown how neglecting this dimension leads to results that at best cast shadows on the predictive performance of a model and its assessment via benchmarking procedures

    Modelling task-dependent eye guidance to objects in pictures

    Get PDF
    We introduce a model of attentional eye guidance based on the rationale that the deployment of gaze is to be considered in the context of a general action-perception loop relying on two strictly intertwined processes: sensory processing, depending on current gaze position, identi\ufb01es sources of information that are most valuable under the given task; motor processing links such information with the oculomotor act by sampling the next gaze position and thus performing the gaze shift. In such a framework, the choice of where to look next is taskdependent and oriented to classes of objects embedded within pictures of complex scenes. The dependence on task is taken into account by exploiting the value and the payoff of gazing at certain image patches or proto-objects that provide a sparse representation of the scene objects. The different levels of the action-perception loop are represented in probabilistic form and eventually give rise to a stochastic process that generates the gaze sequence. This way the model also accounts for statistical properties of gaze shifts such as individual scan path variability. Results of the simulations are compared either with experimental data derived from publicly available datasets and from our own experiments

    Detecting expert’s eye using a multiple-kernel Relevance Vector Machine

    Get PDF
    Decoding mental states from the pattern of neural activity or overt behavior is an intensely pursued goal. Here we applied machine learning to detect expertise from the oculomotor behavior of novice and expert billiard players during free viewing of a filmed billiard match with no specific task, and in a dynamic trajectory prediction task involving ad-hoc, occluded billiard shots. We have adopted a ground framework for feature space fusion and a Bayesian sparse classifier, namely, a Relevance Vector Machine. By testing different combinations of simple oculomotor features (gaze shifts amplitude and direction, and fixation duration), we could classify on an individual basis which group - novice or expert - the observers belonged to with an accuracy of 82% and 87%, respectively for the match and the shots. These results provide evidence that, at least in the particular domain of billiard sport, a signature of expertise is hidden in very basic aspects of oculomotor behavior, and that expertise can be detected at the individual level both with ad-hoc testing conditions and under naturalistic conditions - and suitable data mining. Our procedure paves the way for the development of a test for the \u201cexpert\u2019s eye\u201d, and promotes the use of eye movements as an additional signal source in Brain-Computer-Interface (BCI) systems

    Bayesian Integration of Face and Low-Level Cues for Foveated Video Coding

    Full text link

    Modelling of content-aware indicators for effective determination of shot boundaries in compressed MPEG videos

    Get PDF
    In this paper, a content-aware approach is proposed to design multiple test conditions for shot cut detection, which are organized into a multiple phase decision tree for abrupt cut detection and a finite state machine for dissolve detection. In comparison with existing approaches, our algorithm is characterized with two categories of content difference indicators and testing. While the first category indicates the content changes that are directly used for shot cut detection, the second category indicates the contexts under which the content change occurs. As a result, indications of frame differences are tested with context awareness to make the detection of shot cuts adaptive to both content and context changes. Evaluations announced by TRECVID 2007 indicate that our proposed algorithm achieved comparable performance to those using machine learning approaches, yet using a simpler feature set and straightforward design strategies. This has validated the effectiveness of modelling of content-aware indicators for decision making, which also provides a good alternative to conventional approaches in this topic

    Process Analysis of Visual Search in ADHD, Autism and Healthy Controls - Evidence from Intra-Subject Variability in Gaze Control

    Get PDF
    Increased Intra subject variability, i.e. moment to moment fluctuations in performance, is a candidate endophenotype of Attention Defecit Hyperactivity Disorder (ADHD. In light of potential etiological overlap between ADHD and Autism Spectrum Disorder (ASD) (Biscaldi et al., 2015; Rommelse et al., 2011), it is important to study ISV, in both aforementioned disorders simultaneously. Here, we broaden the study of ISV from reaction time tasks with manual responses to the ISV of gaze control. Children and adolescents with ADHD, ASD and healthy controls, aged 10-13 years (N = 90; all native German speakers) were invited for an oculomotor testing session. Participants were presented a visual search task. The task required participants to find a Portuguese target word shown above a grid with multiple Portuguese German word pairs and indicate its position by pressing response keys matching the search array. Preliminary analysis have been calculated with moment-to-moment fluctuations in eye movements for the period of search. Preliminary results suggest increased ISV in the ADHD group. Our study extends the ISV finding to the ocular-motor domain, proposes methods to study ISV in gaze movement, and highlights its relationship with ASD
    • …
    corecore